Compiler Optimizations for Eliminating Cache Conflict Misses
نویسندگان
چکیده
Limited set-associativity in hardware caches can cause conflict misses when multiple data items map to the same cache locations. Conflict misses have been found to be a significant source of poor cache performance in scientific programs, particularly within loop nests. We present two compiler transformations to eliminate conflict misses: 1) modifying variable base addresses, 2) padding inner array dimensions. Unlike compiler transformations that restructure the computation performed by the program, these two techniques modify its data layout. Using cache simulations of a selection of kernels and benchmark programs, we show these compiler transformations can eliminate conflict misses for applications with regular memory access patterns. Cache miss rates for a 16K, direct-mapped cache are reduced by 35% on average for each program. For some programs, execution times on a DEC Alpha can be improved up to 60%.
منابع مشابه
Compiler Optimizations for High Performance Architectures
We describe two ongoing compiler projects for high performance architectures at the University of Maryland being developed using the Stanford SUIF compiler infrastructure. First, we are investigating the impact of compilation techniques for eliminating synchronization overhead in compiler-parallelized programs running on software distributed-shared-memory (DSM) systems. Second, we are evaluatin...
متن کاملEvaluating a Model for Cache Conflict Miss Prediction
Cache conflict misses can cause severe degradation in application performance. Previous research has shown that for many scientific applications majority of cache misses are due to conflicts in cache. Although, conflicts in cache are a major concern for application performance it is often difficult to eliminate them completely. Eliminating conflict misses requires detailed knowledge of the cach...
متن کاملAdaptive Cache Placement for Scientific Computation
The central data structures for many applications in scientific computing are large multidimensional arrays. These arrays dominate memory accesses and are often accessed with strides that vary across orthogonal dimensions posing a central and critical challenge to develop effective caching strategies. We propose a novel technique to optimize cache placement for multidimensional arrays with the ...
متن کاملModel-Driven Automatic Tiling with Cache Associativity Lattices
Traditional compiler optimization theory distinguishes three separate classes of cache miss – Cold, Conflict and Capacity. Tiling for cache is typically guided by capacity miss counts. Models of cache function have not been effectively used to guide cache tiling optimizations due to model error and expense. Instead, heuristic or empirical approaches are used to select tilings. We argue that con...
متن کاملFast and Efficient Partial Code Reordering: Taking Advantage of Dynamic Recompilation
Poor instruction cache locality can degrade performance on modern architectures. For example, our simulation results show that eliminating all instruction cache misses improves performance by as much as 16% for a modestly sized instruction cache. In this paper, we show how to take advantage of dynamic code generation in a Java Virtual Machine (VM) to improve instruction locality at run-time. We...
متن کامل